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1 Introduction 

Submodularity is a fundamental phenomenon in combinatorial optimization. Submodular func- 
tions occur in a variety of combinatorial settings such as coverage problems, cut problems, welfare 
maximization, and many more. Therefore, a lot of work has been concerned with maximizing or 
minimizing a submodular function, often subject to combinatorial constraints. Many of these al- 
gorithmic results exhibit a common structure. Namely, the function is extended to a continuous, 
usually non-linear, function on a convex domain. Then, this relaxation is solved, and the frac- 
tional solution rounded to yield an integral solution. Often, the continuous extension has a natural 
interpretation in terms of distributions on the ground set. This interpretation is often crucial to 
the results and their analysis. The purpose of this survey is to highlight this connection between 
extensions, distributions, relaxations, and optimization in the context of submodular functions. 

Contributions The purpose of this survey is to present a common framework for viewing many 
of the results on optimizing submodular functions. Therefore, most of the results mentioned - 
with the exception of those in Section 14.31 - are either already published, folklore, or easily gotten 
by existing techniques. In the first case, citations are provided. Nevertheless, for most of these 
results we present alternate, hopefully simplified statements and proofs that present a more unified 
picture. In Section 14.31 ^6 present a new result for minimizing symmetric submodular functions 
subject to a cardinality constraint. 



2 Preliminaries 

2.1 Submodular Functions 

We begin with some definitions. We consider a ground set X with \X\ = n. A set function on X 
is a function f : 2^ ^ R. 

*This is a slightly revised version of the author's PhD Qualifying Exam Report, the original version of which was 
submitted to the Department of Computer Science at Stanford University in December of 2009. The Qualifying exam 
committee consisted of Serge Plotkin, Tim Roughgarden (Thesis Advisor) and Jan Vondrak. 
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Definition 2.1. A set function / : 2 — )■ M is submodular if, for all A, B Q X 

f{AnB) + fiAuB)<f{A) + fiB) 

Equivalently, a submodular function can be defined as set function exhibiting diminishing mar- 
gial returns. 

Definition 2.2. A set function f : 2^ ^ M. is submodular if for all A,BCX with A C B, and 
for all j eX\B, 

f{Au{j})-f{A)>f{BU{j})-f{B) 

The fact that the first definition implies the second can be easily checked by a simple algebraic 
manipulation. The other direction can be shown by a simple induction on |y4Ui?| — |Ani?|. 

We distinguish additional properties of set functions that will prove useful. We say / : 2"'^ — )• M 
is nonnegative if f{S) > for all S CI X. / is normalized if /(0) = 0. / is monotone if f{S) < f(T) 
whenever 5 C T. Moreover, / is symmetric if f{S) = f{X \ S) for all 5 C X. 

Algorithmic results on optimizing submodular functions can often be stated in the general value 
oracle model. This encapsulates most special cases of these functions that arise in practice. In the 
value oracle model, access to / is via value queries: the algorithm may query for the value of f{S) 
for any S. 

We conclude with some more concrete examples of submodular functions that arise in practice. 
We say / is a coverage function when elements of X are sets over some other ground set Y, 
and f{S) = I Uu^s U\. Therefore, problems such as max-k-cover problem can be thought of as 
maximizing a coverage function subject to a cardinality constraint of k. Another class of submodular 
functions is cut functions. A set function / : 2^ ^ Z is a cut function of a graph G = {V, E) if f{U) 
is the number of edges of G crossing the cut (C/, V \ U). This can be generalized to hypergraphs. 
Moreover, weighted versions of both coverage functions and cut functions are also submodular. 
There are many other examples of submodular functions, for which we refer the reader to the 
thorough treatment in [6]. 

2.2 Polytopes and Integrality Gaps 

A set P C is a polytope if it is the convex hull of a finite number of points, known as the 
vertices of P, in M". Equivalently, P C M"' is a polytope if an only if it is the intersection of a finite 
number of halfspaces in M". Polytopes are convex sets, and are central objects in combinatorial 
optimization. 

We will consider optimizing continuous functions over polytopes. In the context of maximization 
problems, we say a function F : D —^M with P Q D CI M" has integrality gap a relative to polytope 
P if 

max {F(x) : x G P} 

= a 

max {F{x) : x C P,x a Z"} 

If F has integrality gap 1 relative to P, we say it has no integrality gap relative to P. 

2.3 Matroids 

A Set System is a pair (X,/), where X is the ground set, and / is a family of subsets of X. A 
special class of set systems, known as matroids, are of particular interest. When M = (X, /) is a 
matroid, we refer to elements of / as the independent sets of M. 
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Definition 2.3. A matroid is a set system {X, I) that satisfies 



• Downwards Closure: IfTGl and /S C T then S G /. 

• Exchange Property: If S,T ^ I, and \T\ > l^l, then there is y (z T\S such that S U {y} € /. 

Given a matroid M = {X,I), we define the matroid polytope P{M) C [0, 1]"''" as the convex hull 
of the indicator vectors of the independent sets of M. 



Edmonds ^ showed that an equivalent characterization of P{M) can be given in terms of the 
rank function of the matroid. The rank function tm : 2"''- — >■ Z of matroid M is the integer-valued 
submodular function defined by 



Using the rank function, the matroid polytope can be equivalently characterized as follows. For a 
vector X E M"^ and S" C X, we use x{S) to denote X]iG5^«- 



We note that the vertices of the matroid polytopes are all integers, by the first definition. 

3 Extensions and Distributions 

An extension of a set function / : 2"'*- — )• M is some function from the hypercube [0, 1]"'*" to M 
that agrees with / on the vertices of the hypercube. We survey various extensions of submodular 
functions, and connect them to distributions on subsets of the ground set. 

3.1 The Convex Closure, Lovasz Extension, and Chain Distributions 

In this section, we will define the convex closure of any set function, and reduce minimization of 
the set function to minimization of its convex closure. Then, we will show that, for submodular 
functions, the convex closure has a simple form that can be evaluated efficiently at any point, and 
thus minimized efficiently. 

3.1.1 The Convex Closure 

For any set function / : 2"^ — )• M, be it submodular or not, we can define its convex closure f~. 
Intuitively, /~ can be constructed from / by first plotting / in IRI"^I+^, and then placing a "blanket" 
under the resulting graph and pulling up until the blanket is taut. Formally, can be defined as 
follows. 

Definition 3.1. For a set function f : 2^ ^ 'M., the convex closure f- : [0, 1]^ ^ M is the 

point-wise highest convex function from [0,1]"^ to M that always lowerbounds f. 




rM{S) = max{|r| -.T eI,T CS} 



P(M) = {x£ M+" : x{S) < rM{S) for all S C X} 
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It remains to show that convex closure exists and is well-defined. Observe that the maximum of 
any number (even infinite) of convex functions is again a convex function. Moreover, the maximum 
of any number (even infinite) of functions lowerbounding / is also a function lowerbounding /. 
This establishes existence and uniqueness of the convex closure, as needed. We can equivalently 
define the convex closure in terms of distributions on subsets of X. 

Definition 3.2. Fix a set function f : 2^ ^ M.. For every x G [0,1]"'^, let Dj{x) denote a 
distribution over 2^ , with marginals x, minimizing E^^^-^-^.^ [/(5)] (breaking ties arbitrarily). The 

Convex Closure /~ can be defined as follows: f~{x) is the expected value of f{S) over draws S 
from Dj{x). 

To see that the two definitions are equivalent, let us use f^ and f2 to denote the convex closure 
as defined in Definitions 13.11 and 13.21 respectivelv. First, since the epigraph of a convex function is a 
convex set, it immediately follows that /j~ lowerbounds Moreover, it is easy to see that f2{x) 
is the minimum of a simple linear program with x in the constraint vector; thus f^ is convex by 
elementary convex analysis. Combining these two facts, we get that fi=f2, as needed. 

Next, we mention some simple facts about the convex closure of /. First, it is apparent from 
Definition 13.21 that the convex closure is indeed an extension. Namely, it agrees with / on all the 
integer points. Moreover, it also follows from Definition 13.21 that /~ only takes on values that 
correspond to distributions on 2^ , and thus the minimum of /~ is attained at an integer point. 
This gives the following useful connection between the discrete function and its extension. 

Proposition 3.3. The minimum values of f and f~ are equal. If S is a minimizer of f[S), then 
I5 is a minimizer of f~ . Moreover, if x is a minimizer of f~ , then every set in the support of 
Dj(x) is a minimizer of f . 

3.1.2 The Lovasz Extension and Chain Distributions 

In this section, we will describe an extension Cf : [0,1]"^ — ?• M, defined by Lovasz in of an 
arbitrary set function / : 2"''" ^ M. In the next section we will show that, when / is submodular, 
Cf = f^. We define Cf as follows. 

Definition 3.4. (16]) Fix x € [0, 1]"''", and let X = {vi,V2, ■ ■ ■ ,Vn} such that x{vi) > x{v2) > . . . > 
x{vn)- For < i < n, let Si = {vi, . . . ,Vi}. Let {Aj}"^Q be the unique coefficients with Aj > and 
Aj = 1 such that: 

n 

X = ^Ails, 

i=0 

It is easy to see that A^ = x{vn), and for < i < n we have Aj = x{vi) — x{vi+i), and Aq = l — x{vi). 
The value of the Lovasz extension of f at x is defined as 

Cf{x) = Y,>^^fiSi) 

i 

We can interpret the Lovasz Extension as follows. Given a set of marginal probabilities x G 
[0, 1]"''" on elements of X, we construct a particular distribution D^{x) on 2^ satisfying these 
marginals. Intuitively, this distribution puts as much probability mass on the large subsets of 
X, subject to obeying the marginals. Therefore, the largest possible set X = Sn gets as much 
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probability mass as possible subject to the smallest marginal x{vn)- When the marginal probability 
x{vn) of Vn has been "saturated", we put as much mass as possible on the next largest set 
It is easy to see that the next element saturated is Vn-i, after we place x{vn-i) — x{vn) probability 
mass on Sn~i- And so on and so forth. Now, it is easy to see that Cf(x) is simply the expected 
value of / on draws from the distribution D'~{x). 

A note on the distributions D^{*) defining the Lovasz extension. Notice, that the definition 
D^(x) is oblivious, in that it does not depend on the particular function /. Moreover, notice that 
the support of D^{x) is a chain: a nested family of sets. We call such a distribution a chain 
distribution on 2^ . The following easy fact will be useful later. 

Fact 3.5. The distribution D^{x) is the unique chain distribution on 2^ with marginals x. 
3.1.3 Equivalence of Lovasz Extension and Convex Closure 

We will now show that, for a submodular function /, the Lovasz extension and the convex closure 
are one and the same. This is good news, since we can evaluate the Lovasz Extension efficiently at 
any x € [0, 1]'''" , and moreover we can explicitly construct a distribution with marginals x attaining 
the value of the Lovasz Extension at x. This has implications for minimization of submodular 
functions, as we will show in Section [4.1[ 

The intuition behind this equivalence is quite simple. Recall that, from Definition 13.21 the value 
f^{x) is simply the minimum possible expected value of / over a distribution on 2-^ with marginals 
X. Fixing / and x, we ask the question: what could a distribution Dj{x) attaining this minimum 
look like? Submodularity of / implies that / exhibits diminishing marginal returns. Therefore, 
subject to the marginals x, the value of / is smallest for distributions that "pack" as many elements 
together as possible in expectation. By definition, that is roughly what D'-'{x) is doing: it packs as 
many elements together subject to the smallest marginal, then packs as many unsaturated elements 
together until the next marginal is saturated, etc. 

While the above intuition is helpful, the proof is made precise by cleaner uncrossing arguments. 
To illustrate a simple uncrossing argument, consider two sets A,B ^2^ that are crossing: neither 
A C B nor B C A. Now, consider a simple distribution D that outputs each of A and B with 
probability 1/2. Now, consider uncrossing D to form the distribution D', which outputs each of 
An B and AU B with probability 1/2. Observe that D and D' have the same marginals, yet by 
direct application of Definition 12.11 we conclude that 

^E^, f{S) = \ if {A nB) + f{A UB))<^ if (A) + f{B)) < f{S) 

Therefore, starting with any distribution, we can keep uncrossing it without changing the 
marginals or increasing the expected value of /. To conclude that this process terminates with 
a chain distribution, we need a notion of progress. We make this precise in the following Lemma 
and subsequent Theorem. 

Lemma 3.6. Fix a submodular function f : 2-^ ^ M.. Let D be an arbitrary distribution on 2^ 
with marginals x. If D is not a chain distribution, then there exists another distribution D' with 
marginals x and Esr^D' f{S) < Eg^/j f{S), such that Es^d' \S\'^ > Es^d l-Sp. 

In other words, any non-chain distribution D can be uncrossed to form a distribution D' that 
is no worse, and is closer to being a chain distribution. The quantity E is simply a potential 
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function that measures progress towards a chain distribution; other choices of potential function 
work equally well. 



Proof of Lemma \3.(A Fix /, D and x as in the statement of the Lemma. Assume D is not a chain 
distribution. Therefore, there exist two sets ^, S C X in the support of D (i.e. [^] , Pr/j [i?] > 
0) that are crossing: neither B nor B <Z A. Assume without loss of generality that Pr£)[i?] > 
Pr£)[A]. We define a new distribution D' that simply replaces draws of A and B with draws of 
Ar\B, B, and A U i?, as follows. 

Pr(5) = Pr[5] for S i {A,B,Af^ B,AU B] 

Prfyl n 5) = Pr[yl r\ B] + Prf^l 

D' D D 

Y>t{A \JB) = Prl \A\JB] + Pr [Al 

D' D D 

Pr(S) = Y>v\B] - Vv\A] 

D' D D 

PrM) = 

D' 

Notice that distribution D' simply pairs up draws of A and B from D, and replaces each such 
pair with a draw oi Ar\ B and a draw of .4 U i?. It is easy to check that this does not change the 
marginals x. Moreover, this allows us to conclude that the difference in the expected value of / is 
given by: 

^E^, f{S) - ^E^ f{S) = [Vv[A]f{A nB) + Pr[A]f{A U B)] - [Pr[^]/(^) + Pr[A]f{B)] 

Directly applying Definition 12.11 we conclude that this quantity is at most 0. As for the change in 
the potential function E[|5'p], we get 

E |5p- E ISl"^ = (pr[A]-\AuB\'^ + Pr[A]-\AriB\A - (pr[A]-\A\'^ + Pr[A]-\B\A 
= Pv[A] {\A U B\^ + \Ar\ B\^ - \A\^ - \B\^) > 

Where the last inequality follows from the inclusion-exclusion equation and the strict convexity of 
the squaring function. □ 

Now that we know we can "uncross" any non-chain distribution without increasing the expec- 
tation of / or changing the marginals, we get the Theorem. 

Theorem 3.7. Fix a submodular function f : 2-^ ^ M. For any x G [0, 1]"'^, we can take DJ{x) = 
D^{x) (without loss), and therefore f''{x) = Cf{x). Thus f^ = Lf. 

Proof. Fix / and x. Let D* be a choice for Dj{x) maximizing Es^d*\S\'^ ■ The maximum is 
attained by standard compactness arguments. We will show that D* is a chain distribution, which 
by Fact 13.51 implies that D* = completing the proof. 

Indeed, if D* were not a chain distribution, then by Lemma 13.61 there exists another choice D' 
for Dj{x) such that Esr^o' I'S'P > Es^^* I'S'p. This contradicts the definition of D*. □ 
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The above Theorem imphes the following remarkable observation: The distribution minimizing 
a submodular function subject to given marginals can be chosen obliviously, since D'-'{x) does not 
depend on the particular submodular function / being minimized. As we will see in the next 
section, the same does not hold for maximization. 

For completeness, we conclude with a strong converse of Theorem 13.71 

Theorem 3.8. Fix a set function / : 2"''" — > M. If Cf is convex then f is submodular. 

Proof. We take a non-submodular /, and show that Cf is non-convex. We will show that the Lovasz 
extension makes a suboptimal choice for minimization at some x G [0, 1]-^: namely, Cf{x) > f^{x). 
By Definition 13.11 f~{x) is the point- wise greatest convex extension of /. This implies that Lf \s 
non- convex. 

We now exhibit x such that Cf{x) > f^{x). By Definition 12.21 there exists a set A C X, and 
two elements i,j^A, such that 

f{A U {zj}) - f{A U {i}) > f{A U {j}) - f{A) 

Define x G [0,1]"'^ as follows: x{k) = 1 for each k G A, and x{i) = x{j) = 1/2, and x{h.) = 
otherwise. Now it is intuitively clear that the Lovasz Extension makes the wrong choice for 
minimization: it will attempt to bundle i and j together despite increasing marginal returns. 
Indeed, By Definition 13.41 the Lovasz extension at x evaluates to 

Cf{x) = \f{AVJ{i,j}) + \f{A) 

Now, consider the distribution with marginals x, defined by Pr£)[A U {i}] = Pr£)[74 U {j}] = ^. 
Since, by Definition 13.21 f~{x) lowerbounds the expectation of any distribution with marginals x, 
we have that 

r(x)<^/(.4uw) + ^/(^u{i}) 

We can now combine the three above inequalities to establish Cf{x) > f^{x), completing the proof. 
2(L;(x) - f-{x)) > f{A U {i,j}) + f{A) - f{A U {i}) - f{A U {j}) > 

□ 

3.2 The Concave Closure 

The concave closure f^ of any set function / can be defined analogously to the convex closure. The 
intuition is similar: can be constructed from / by first plotting / in M'^'^I^^, and then placing 
a "blanket" above the resulting graph and pulling down until the blanket is taut. We again state 
the two equivalent formal definitions. 

Definition 3.9. For a set function f : 2^ ^ the concave closure /+ : [0,1]^ ^ M is the 
point-wise lowest concave function from [0, l]""*" to M that always upperbounds f. 

Definition 3.10. Fix a set function f : 2-^ ^ M.. For every x £ [0,1]"^, let D^{x) denote a 
distribution over 2-^, with marginals x, maximizing 'Eig^j~^+^^^[f{S)\ (breaking ties arbitrarily). The 

Concave Closure can be defined as follows: f~^{x) is the expected value of f{S) over draws S 
from D\{x). 
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By a similar argument to that presented in Section 13.1.11 both definitions are weh-defined and 
equivalent. 

It is tempting to attempt to explicitly characterize the distribution D^{x) in the same way we 
characterized Dj{x). However, no such tractable characterization is possible. In fact, it is NP-hard 
to even evaluate f~^{x), even when / is a graph cut function. 

Theorem 3.11. (JJl ^) It is NP-hard to evaluate f^{x) for an arbitrary submodular f : 2-^ ^M. 
and X E [0, 1]''''. This is true even when f is a graph cut function. 

Proof. The proof is by reduction from the NP-hard problem Max-Cut. In the max cut problem, 
we are given an undirected graph G = {V,E), and the goal is to find a cut (5, F \ S") maximizing 
the number of edges crossing the cut. Let f{S) be number of edges crossing the cut (S, V \ S). 

We reduce finding the maximum non-trivial cut (with S ^ 0, V^) to the following convex op- 
timization problem: Maximize f^{x) subject to 1 < 1-x < n — 1. Indeed, it is clear that this 
is a relaxation of the max-cut problem. The optimum is attained at an integer point x*, since 
without loss of generality the trivial sets (0 and V) will not be in the support of any optimum 
distribution. Therefore, if f~^{x) can be evaluated in polynomial time for an arbitrary x, then this 
convex optimization problem can be solved efficiently. This completes the reduction. □ 

Stronger hardness results are possible. In fact, it is shown in [7] that, even when / is a monotone 

coverage function and k is an integer, the convex optimization problem max|/"'"(x) : 1 • x < A;| 

is APX-hard. More generally, it is shown in [3] that it is hard to maximize general submodular 
functions in the value oracle model (independently of P 7^ NP) with an approximation factor 
better than 1/2. 

In light of these difficulties, there is no hope of finding exact polynomial time algorithms for 
maximizing submodular functions in most interesting settings, using f~^ or otherwise. Therefore, we 
will consider another extension of submodular functions that will prove useful in attaining constant 
factor approximations for maximization problems. 

3.3 The Multihnear Extension and Independent Distributions 
3.3.1 Defining the Multilinear Extension 

Ideally, since concavity is intimately tied to maximization, we could use the concave extension 
of a submodular function in relaxations of maximization problems. However, unlike the convex 
closure, the concave closure of a submodular function cannot be evaluated efficiently. Moreover, 
since is the point- wise lowest concave extension of /, any concave extension will have a non- 
trivial integrality gap relative to most interesting polytopes, including even the hypercube. In other 
words, any concave extension other than /"*" will not correspond to a distribution at every point of 
the domain [0, l]"''-; a property that has served us particularly well in minimization problems. 

In light of these limitations of concave extensions, we relax this requirement and instead exhibit 
a simple extension that is up-concave: concave in all directions u € M" with ^ (or, equivalently 
It ^ 0). Moreover, this extension will correspond to a natural distribution at every point, and 
therefore will have no integrality gap on the domain [0,1]"'^. Surprisingly, this extension will also 
have no integrality gap over any matroid polytope. As we will see in Section W?2\ it turns out that, 
under some additional conditions, up-concave functions can be approximately maximized over a 
large class of polytopes. 
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Without further a-do, we define the multilinear extension F of a set function /. First, we say a 
function F : [0, 1]"''^ — )■ M is multi-linear if it is hnear in each variable Xj, when the other variables 
{xj}j_^- are held fixed. It is easy to see that multilinear functions from M.-^ to M form a vector 
space. Moreover, a simple induction on dimension shows that a multi-linear function is uniquely 
determined by its values on the vertices of the hypercube. This allows us to define the multilinear 
extension. 

Definition 3.12. Fix set function f : 2^ ^ M.. The multilinear extension F : [0, 1]'''" M. of f is 
the unique multilinear function agreeing with f on the vertices of the hypercube. 

As with the Lovasz extension, the multilinear extension corresponds to a natural distribution at 
each point x € [0, 1]"'^, and moreover this distribution has marginals x. This distribution becomes 
apparent if we express F in terms of a simple basis, with each element of the basis corresponding to 
a vertex of the hypercube. For a set S X, we define the multilinear basis function Ms as follows 

Msix) = llxi-ll{l-Xi) 

Since a multilinear function is uniquely determined by its values on the hypercube, it is easy to 
check that any multilinear function can be written as a linear combination of the basis functions 
{Ms}g(^x^ with f{S) as the coefficient of Ms- 

F{x)=Y^f{S)-Ms{x)=Y,f{S)-llx.-Yl{l-x.) 

sex sex ieS i^S 

Inspecting the above expression, we notice that F{x) corresponds to a simple distribution 
with marginals at x. Let D'^{x) be the distribution on 2^ that simply picks each element v (z X 
independently with probability x{v). It is clear that Pr^ji^^.^ [5"] = Ms{x). Therefore, it is clear that 
F{x) is simply the expected value of / over draws from D^{x). This gives the following equivalent 
definition of the multilinear extension. 

Definition 3.13. Fix a set function / : 2"'^ — )■ M. For each x € [0, 1]^, let D^{x) be the distribution 
on 2^ that picks each v ^ X independently with probability x{v). The value of the multilinear 
extension F : [0, 1]''*' — )■ M at x can be defined as the expected value of f over draws from D'^{x). 

F{x) = E f{S) = fiS) • n ^' • 11(1 - X.) (1) 

'^^'^"y^i sex idS i^s 

We note that, like the Lovasz extension, the multilinear extension has the property of being 
oblivious: the distribution defining F at x does not depend on the set function /. The fact that, 
yet again, an oblivious extension lends itself particularly well to solving optimization problems is a 
remarkable, and arguably fundamental phenomenon. 

3.3.2 Useful Properties of the Multilinear Extension 

In this section, we will develop some properties of the Multilinear extension that will be useful for 
problems involving maximization of submodular functions. The maximization problem we consider 
in Section [12] is that of maximizing a monotone, submodular function f : 2-^ ^M. over independent 
sets of a matroid M = {X, I). 

First, we show that the multilinear relaxation of a monotone set function is also monotone. 
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Proposition 3.14. (^I) If f -.2^ is a monotone set function, then its multilinear relaxation 
F : [0, 1]^ — 7- M is monotone. That is, whenever x <y, we have F{x) < F{y). 

Proof. By Definition 13.131 it suffices to show that distribution D^(y) draws a pointwise larger 
set than We couple draws Sx ~ D'^{x) and Sy ~ F)^{y) in the obvious way: for each 

u S X we independently draw a random variable R{v) from the uniform distribution on [0,1]. 
If < R{v) < x{v), then we let v G Sx, otherwise v ^ Sx- Similarly, v £ Sy if and only if 
< R{v) < y{v). Since we have x{v) < y{v) for every v & X, it is clear that, under this coupling, 
Sx Q Sy pointwise. By monotonicity of /, this implies that F(x) < F(y). □ 

Next, we will show that the multilinear extension F of a submodular / is up- concave: concave 
when restricted to any direction u ^ (equivalently u ^0). In other words, it must be that for any 
x G [0, 1]"'^ and li ^ 0, the expression F{x + tu) is concave as a function of t G M over the domain 
of F. This is consistent with the diminishing-marginal-returns interpretation of submodularity 
and the independent distribution interpretation of F: weakly increasing the marginal probability 
of drawing each item can only result in items getting packed together into larger and larger sets 
(in a point- wise sense), and hence yielding diminishing marginal increases in the expected value 
of /. This intuition can be made precise by carefully coupling draws from D^[x) and D'^[y) for 
some X ^ y, and considering the marginal increases from transitioning from D^{x) and D^[y) to 
D^{x + 6u) and D^[y -\- 5u) respectively (for some u ^ and arbitrarility small 5 > 0). However, 
we will instead use tools from linear algebra to get a cleaner proof. 

Using elementary linear algebra, up-concavity can be re-stated as a condition on the hessian 
matrix V^F(x) of F at x. The matrix V^F(x) is the symmetric matrix with rows and columns 
indexed by X, and the (i,j)'th entry corresponding to the second partial derivative g^.g^ (x). 
Up-concavity is then the condition that 

u^{\7^F{x))u < for ah x G [0, 1]^^ and n ^ 

Since F is multilinear, the diagonal entries of \/'^F[x) are always 0. Therefore, by considering 
l|jjj| as choices for u, we conclude that F is up-concave if and only if all the second 

partial derivatives g^ g^ (x) are non-positive. Indeed, this is consistent with submodularity and 
the independent-distribution interpretation of F: increasing the probability of including i only 
results in sets that are point-wise larger, and therefore these sets would benefit less by inclusion of 
j as well. 

Proposition 3.15. (^81) If f : 2^ ^M. is submodular, then its multi-linear relaxation F : [0, 1]'''' — ?> 
M is up-concave. 

Proof. By the discussion above it suffices to show that, for each x G [0, 1]"'^, we have that gf Ji^ . (x) < 
0. Fixing x, we take the derivative of F with respect to Xi to get: 

a 771 a 

The equality above follows immediately from the independent distribution interpretation of F, 
by conditioning on all events j € S for j ^ i and considering the expectation of / as a function of 
the marginal probability of i. Using linearity of expectation and taking the derivative again in the 
same way with respect to j, we get 
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E , \f{S U {ij}) - f{S Ui)- f{S U j) + f{S)] 



dxidxj s^Di{x) 

Using Definition 12. H we get that this quantity is non-positive, as needed. □ 

It is clear that, since the value of F at any point corresponds to the expectation of / at a 
distribution on 2^, that F has no integrality gap relative to the hypercube [0,1]"''^. Since we will 
consider constrained maximization problems, it would be useful if this held for interesting subsets 
of the hypercube. It turns out that, for a submodular function /, a useful property that we term 
cross-convexity yields precisely such a guarantee relative to all matroid polytopes. Cross convexity 
means that trading off two elements i and j gives a convex function, or increasing marginal returns. 

Definition 3.16. We say a function F : [0, 1]"''" — )• M is cross-convex if, for any i j, the function 
Ffj{e) := F{x + e{ei — ej)) is convex as a function o/e € M. 

Cross-convexity is consistent with submodularity and the independent distribution interpreta- 
tion of the multilinear relaxation. Consider independent distribution and the associated 
expectation of /. It is an easy exercise to see that the probability of "collision" of i and j - that 
is, the probability that both are drawn by the indepenent distribution - is a concave function of e. 
Since "collision" corresponds to diminishing marginal returns, or a decrease in the expected value 
of /, this means that the expectation of / is convex in e. We make this precise in the proposition 
below. 

Proposition 3.17. (Ilj) When / : 2"^ ^ M is submodular, its multilinear extension F is cross- 
convex. 

Proof. Fix x and i ^ j. We can write Ffj{e) as: 

Consider the random variable S: the set of elements other than i and j that are drawn from 
D^{x + e(ej — Cj)). We have 

= B[{xi + e){x,-e)f{SU{i,j}) 
s 

+ ixi + e){l-xj + e)f{SU{i}) 
+ {l-x,-e){xj-e)f{SU{j}) 
+ {l-x,-e){l-Xj + e)f{S)] 

Observe that the coefficient of in the above expression is f{S U {i}) + f{S U {j}) — f{S U 
{i,j}) — f{S). By Definition l2.lt this is nonnegative, which yields convexity in e as needed. □ 

Consider any x G [0, 1]'''" and any fractional Xi,Xj. We can trade off items i and j, in the sense 
defined above, until one of them is integral. Cross convexity implies that the maximum point of 
this tradeoff lies at the extremes. Therefore, repeating this process as long as there are fractional 
variables, we can arrive at an integer point x' € {0, 1}^ such that F[x') > F{x). When the set 
of feasible solutions is constrained to a proper subset of the hypercube, however, this may result 
in an infeasible x' . Nevertheless, for well-structured matroid polytopes, a careful rounding process 
maintains feasibility without decreasing the objective value. This is known as Pipage rounding, and 
will be presented in Section 14.21 
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4 Algorithmic Implications 



In this section, we consider minimization and maximization problems for submodular functions. 
The algorithms we consider will make heavy use of the extensions described in Section [3l 

The algorithms we consider take as input a set X with \X\ = n, and a rational number B. For 
the maximization problem we consider, additional constraints are given as input; we defer details 
to Section 14.21 The function / : 2"^ ^ Q to be minimized or maximized is assumed to satisfy 
minscx fiS) — ^' "^^^ algorithms we present will operate in the value oracle model. We require 
that the algorithms run in time polynomial in n and log-B, and therefore also make a polynomial 
number of queries to the value oracle. 



4.1 Minimizing Submodular Functions 

Proposition 13.31 allows us to reduce discrete optimization to continuous optimization. Namely, we 
reduce minimization of / to minimization of its convex closure /". When /~ can be evaluated 
efficiently, this yields an efficient algorithm for minimizing / using the standard techniques of convex 
optimization. 

When / is submodular, f~ = Cf. It is clear from Section [3.1.21 that the Lovasz extension can 
be evaluated efficiently: we can explicitly construct the distribution Z)^(x), which has support of 
size at most n + 1, and then explicitly compute the expected value of / over draws from L'^(x). 
Therefore, we can compute the minimum of a submodular function by finding the minimum of its 
Lovasz extension. 

Theorem 4.1. There exists an algorithm for minimizing a submodular function / : 2''*- — ?• M 

in the value query model, running in time polynomial in n and logB. 



4.2 Maximizing Monotone Submodular Functions Subject to a Matroid Con- 
straint 

In this section, we consider the probelm of maximizing a nonnegative, monotone, submodular 
function / : 2'''" — > M over independent sets of a matroid M = {X,I). We assume / is given by a 
value oracle as usual, and M is given by an independence oracle: An oracle that answers queries 
of the form: is S G /? It is well known that much can be accomplished in this independence oracle 
model. In particular, we can use submodular function minimization, presented in Section 14. H to 
get a separation oracle for the matroid polytope P{M). 

First, we begin where we left off in Section [3.31 Namely, we will show that we can indeed reduce 
maximization of / over M to maximization of the multilinear relaxation F over the polytope P{M). 
In particular, we show that F has no integrality gap relative to P{M), and the rounding can be 
done in polynomial time. This is known as Pipage Rounding. 

Lemma 4.2. (IJ^) Fix a submodular function f : 2^ ^ and its multilinear relaxation F. Fix a 
matroid M = {X, I). For every point x G P{M), there exists an integer point x' € P{M) such that 
F{x') > F{x). Therefore, F has no integrality gap relative to P{M). Moreover, starting with x, 
we can construct x' in polynomial time. 

Proof. Recall that the rank function rjvf of matroid M is an integer valued, normalized, monotone, 
and submodular set function. Moreover, recall that the matroid polytope is as defined in l2.31 In the 
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ensuing discussion, we will assume that we can efficiently check whether x G P{M), and moreover 
we can find tight constraint when x is on the boundary of P{M). Both problems are solvable by 
submodular function minimization. 

By multilinearity, the proposition is trivial when there is only a single fractional variable. 
Moreover, by multilinearity we may assume without loss of generality that every fractional variable 
appears in at least one tight constraint of the matroid polytope. 

It follows from the submodularity of vm that the family of "tight sets" , those sets S X with 
x{S) = rj\,/(5), is closed under intersection and union. Therefore, we consider a minimal tight set 
T with fractional variables Xi and xj, and trade off Xi and Xj subject to not violating feasibility 
(i.e. not leaving the matroid polytope P{M)). Observe that, by cross-convexity, we can choose the 
extreme point of this tradeoff so that the value of F does not decrease. Moreover, one of two types 
of progress is made: either an additional variable is made integral, or a new tight set T' is created 
that includes exactly one of i or j. It remains to show that, repeating this process so long as there 
are fractional variables, the second type of progress can occur consecutively at most n times. This 
would complete the proof, showing that after at most in? steps all variables are integral. 

Observe that, since T was chosen to be minimal and the tight sets are closed under intersection, 
trading off Xi and Xj does not "untighten" any set. Therefore, this process can only grow the family 
of tight sets. For simplicity, we assume that at each step we choose T to be a tight set of minimum 
cardinality. (This assumption can be easily removed by more careful accounting.) If no variable is 
made integral after trading off Xi and Xj, then an additional tight set T' is created that includes 
exactly one of i or j. Since tight sets are closed under intersection, and tight sets are preserved, 
this implies that the cardinality of smallest tight set strictly decreases. Therefore, a variable must 
be made integral after at most n iterations, completing the proof. □ 

Now, it remains to show that F can be maximized approximately over P{M). In fact, something 
even more general is true, as shown by Vondrak in Any nonnegative, monotone, up-concave 
function can be approximately maximized over any solvable packing polytope contained in the 
hypercube. Here, by packing polytope we mean a polytope P C [0, 1]^ that is down monotone: If 
x,y € [0, 1]'''" with x ^ y and y £ P, then x £ P. A polytope P is solvable if we can maximize 
arbitrary linear functions over P in polynomial (in n) time, or equivalently if P admits a polynomial 
time separation oracle. 

Lemma 4.3. (J^) Fix a solvable packing polytope P C [0,1]'''". Fix a nonnegative, monotone, 
up- concave function F : [0, 1]"''" — )• M+, that can be evaluated at an arbitrary point in polynomial 
time. Then the problem max{F{x) : x G P} can be approximated to within a factor of 1 — 1/e in 
polynomial time. 

Proof Sketch. We may assume without loss of generality that -F(O) = 0. We let OPT denote the 
maximum value of F in P, and use x* to denote the point in P attaining this optimal. Since F is not 
concave in all directions, usual gradient descent techniques fail to provide any guarantees. Instead, 
we will show a modified gradient-descent-like technique that exploits up-concavity. We will consider 
a particle with starting position at G P, and slowly move the particle in positive directions only: 
directions u E M-|-"'. This restriction is not without loss: any local descent algorithm that does not 
"backtrack" cannot guarantee finding the optimal solution. Nevertheless, by arguments analogous 
to those used for the greedy algorithm for max-k-cover, we can guarantee a 1 — 1/e approximation. 
We assume the motion of the particle is a continuous process, ignoring technical details related to 
discretizing this process so that it can be simulated in polynomial time. 
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We use x{t) to denote the position of the particle at time t. We interpret the position x{t) of 
the particle as a convex combination of vertices Vp of P, with vertex v € P having coefficient a^(t) 

x{t) = ay{t) -v 
veVp 

Initially, a^{0) = 1, and 0^,(0) = for each vertex v ^ 0. So long as ag(t) > 0, there is room 
for improvement in positive directions: we can replace in the convex combination by some other 
vertex z ^ 0. By monotonicity, this increases the value of F. 

More concretely, for a small dt > 0, we let ag(t + dt) = a^{t) — dt, and az{t + dt) = azit) + dt- 
We keep a„(t + df) = ay{t) for all v ^ 0, z. It is clear that this process must terminate when t = 1, 
since at that point the vertex is no longer represented in the convex combination. It remains to 
show how to choose z at each step so that F{x{l)) > (1 — l/e)OPT. By simple calculus, it suffices 
to show that z can be chosen so that ^^^■^j^')^ > OPT — F{x{t)). In other words, that the rate of 
increase in the objective is proportional to the distance from the optimal. This is analogous to the 
analysis of many discrete greedy algorithms, such as that for max-k-cover. 

Fixing a time t, what if we choose z so as to maximize the local gain? In other words, 

z = argmax vF(x(t)) • z 

Finding such a z reduces to maximizing a linear function over the matroid polytope, which can be 
accomplished in polynomial time. It remains to show that there exists a z' € P with \/F{x{t))-z' > 
OPT-F{x{t)). 

Consider z' = max(x(t),x*) — x{t), where the maximization is taken co-ordinate wise. We can 
interpret z' as the "set-wise difference" betweeen x* and x{t). Indeed, if x* and x{t) were integral 
indicator vectors corresponding to subsets of X, then z' is precisely the indicator vector of their 
set difference. The difference between any two sets in a downwards-closed set system is again in 
the set system. This analogy can be made precise to show that z' G P as follows: z' < x* ^ P. 

We now show that z' gives the desired marginal increase in objective. First, it is easy to see that 
x{t) + z' y X*, and therefore by monotonicity F{x{t) + z') > F{x*) = OPT. Moreover, since F is 
up-concave and z' >: 0, we get that '\7F{x{t)) ■ z' > OPT — F{x{t)). This completes the proof. □ 

When F is the multilinear relaxation of /, we can evaluate F to arbitrary precision by a 
polynomial number of random samples [8]. Combining Lemmas 14.21 and 14.31 we get the Theorem. 
Technical details that compensate for the loss of approximation due to sampling are ommitted. 

Theorem 4.4. (I8j) There exists an algorithm for maximizing a nonnegative, monotone, submod- 
ular function f : 2^ ^ M. given by a value oracle, over a matroid M given by an independence 
oracle, that achieves an approximation ratio of 1 — 1/ e and runs in time polynomial in n and log B. 

4.3 New Result: Minimizing Nonnegative Symmetric Submodular Functions 
Subject to a Cardinality Constraint 

In this section, we consider the problem of minimizing a nonnegative symmetric submodular func- 
tion subject to a cardinality constraint. First, we make the simple observation that, by submodular- 
ity, the minimum of a symmetric submodular function is always attained at and X. Therefore, as 
is usual when we are working with symmetric submodular functions, we consider minimization of / 
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over non-empty sets. Moreover, observe that, by symmetry, an upperbound of k on the cardinality 
is equivalent to a lowerbound of n — k. Therefore, we assume without loss that are minimizing / 
over non-empty subsets of X of cardinality at most k. 

Symmetric submodular functions often arise as cut-type functions. The cut-function of an 
undirected graph is the canonical example. In this context, our problem is equivalent to finding the 
minimum cut of the graph that is sufficiently unbalanced: i.e. with smaller side having cardinality 
at most k. We term this problem the minimum-unbalanced-cut problem, and point out that it has 
obvious implications for finding small "communities" in social networks. 

A slight generalization of minimum unbalanced cut was studied in [5j. There, they consider the 
"sourced" version, where a designated node s is required to lie in the side of the cut of interest (the 
side with at most k nodes). They show that this sourced- min-unbalanced-cut problem is NP-hard 
by reductions from at-most-fe-densest subgraph and max-clique. Moreover, they give an algorithm 
achieving a bicriteria result parametrized by a > 1: They find a cut of capacity at most a of the 
optimal unbalanced cut, yet violating the cardinality constraint by a factor of up to When 
a = 2, this gives a 2-approximation algorithm that overfiows the constraint by a factor of at most 2. 
Their techniques do not directly yield a constant approximation algorithm for the problem without 
violating the constraint. 

4.3.1 A 2-approximation algorithm 

In this section, we show a 2-approximation algorithm for minimizing a nonnegative, symmetric 
submodular function subject to a cardinality constraint. Without loss, we assume the constraint is 
an upper bound of k on the cardinality of the set. The algorithm operates in the value query model, 
and runs in polynomial time. This result is stronger than the result in [5j in two ways: It applies to 
general nonnegative symmetric submodular functions rather than just graph cut functions, and it 
achieves a constant factor approximation without violating the constraint. The reader may notice, 
however, that this problem as-stated is not strictly more general than the "sourced" problem 
considered in [5]. We leave open the question of whether a similar guarantee is possible for the 
sourced problem. 

We will now argue that Algorithm [T] runs in polynomial time. Step 2 can completed in polyno- 
mial time by standard convex optimization techniques. For step 3, the polynomial-time construction 
in Section 13.1.21 computes an explicit representation of L'^(x). Moreover, from Section 13.1.21 we 
know that D^{x) has a support of size at most n + 1, and thus steps 4 and 7 can be completed in 
polynomial time. It is then easy to see that the entire algorithm terminates in polynomial time. 

Next, we argue correctness by nondeterministically stepping through the algorithm. Let S* 
denote the optimal solution to the problem, with f{S*) = OPT. First, assume the algorithm 
guesses some vi S S*. Since £/ is an extension of / and S* has cardinality at most k, step 2 
computes x with Cf{x) < OPT. Moreover, we know from Section [3. 1.2l that Cf{x) is the expected 
value of / over draws from D^{x). 

If S with I/SI < and f{S) < 2Lf{x) is found in step 4, then we terminate correctly with a 
2-approximation. Otherwise, we can show that step 7 finds S' with f{S') < OPT. 

Lemma 4.5. Either there exists S in the support of D^[x) with \S\ < k and f{S) < 2Lf{x), or 
there exists S' in the support of D^{x) with \S'\ < 2k, and f{S') < Lf{x). 
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Algorithm 1 2-approximation for minimizing nonnegative, symmetric, submodular f subject to 
cardinality constraint. 

Require: / : 2^ — >■ R a nonnegative, symmetric, submodular function given by a value oracle. 

Integer k such that < < n. 
Ensure: Q minimizes / over non-empty sets of size at most k 
1: for all vi e X do 

2: Find X G [0, 1]^ minimizing Lovasz extension Cf subject to x{vi) = 1 and 1 ■ x < A;. 
3: Construct the Lovasz extension distribution D^(x) corresponding to point x. 
4: if There is S in the support of D^{x) with l^l < k and f{S) < 2Lf{x) then 
5: return S 
6: else 

7: Find S' in the support of D^{x) minimizing f{S') subject to <2k 
8: for all i;2 G S"\{i;i} do 

9: Using submodular minimization, find T minimizing fiT) subject to ui G T and V2 ^ T. 
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if |rnS"| < k then 


11 


Qvi,v2 T r\ s 


12 


else 
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Qvi,v2 • — T C] s 


14 


end if 


15 


end for 


16 


end if 


17 


end for 


18 


Q := argmin^^_^2 f{Qvi,v2) 


19 


return Q 



16 



Proof. Assume not. It is now easy to check that each set R in the support of D^{x) has 




Taking expectations, we get that 




f{R) > (2 




^) Cf{x) > Lf{x) 



The last inequahty follows from the fact that the expected value of is at most /c, by definitoin 
of D^{x). This is a contradiction, since by definition the expectation of / over draws from D^[x) 



Now, assuming no appropriate S was found in step 4, we have 5" as in the statement of Lemma 
14.51 with k < \S'\ < 2k and vi G S'. Since \S*\ < k, we know that there exists V2 G S' such that 
V2 ^ S* . In particular, there exists a set containing vi and not containing V2 with value at most 
OPT. Assume the algorithm guesses such a V2. This immediately yields the following Lemma. 

Lemma 4.6. If vi G S* and V2 e S' \ S* then step 9 finds T such that f{T) < OPT. 

Therefore, combining Lemmas 14.51 and 14.61 we get the following from submodularity and non- 
negativity: 



Moreover, we know by symmetry of / that f{T) = f{T) < OPT. Therefore, by the same 
calculation we get f{Tr\ S') < 20PT. Now, observe that T n S' and T (1 S' partition S' into 
non-trivial subsets by definition of T. This gives that the smaller of the two, Qvi,v2^ ticis cardinality 
between 1 and k, and moreover f{Qvi,v2) ^ 20PT. The algorithm tries all vi and ^2, so this 
immediately yields the Theorem. 

Theorem 4.7. Algorithm{l\is a polynomial-time 2- approximation algorithm for minimizing a non- 
negative, symmetric, suhmodular function subject to a cardinality constraint in the value oracle 
model. 

Conclusion In this survey, we considered various continuous extensions of submodular functions. 
We observed that those extensions yielding algorithmic utility are often associated with natural, 
even oblivious distributions on the ground set. We presented a unified treatment of two existing 
algorithmic results, one on minimization and one on maximization, using this distributional lens. 
Moreover, we demonstrate the power of this paradigm by obtaining a new result for constrained 
minimization of submodular functions. 
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is precisely Cf{x). 



□ 



/(T n S') < f{T n S') + f{T U 5') < /(T) + f{S') < OPT + OPT = 20PT 
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